Abalone_Pinto_v5beta_8673.fa



filtered_P_Ab_summary report.pdf








24109 ESTs
sequence (6).fasta





29357 Nucleotides (mRNA)

sequence (7).fasta


Combined





CDHIT EST-


running through CD-HIT_EST

./cd-hit-est -i /Volumes/Bay4\ scratch/temp/Haliotis_NCBI_combined.fasta -o /Volumes/Bay4\ scratch/temp/Haliotis_comboNCBI_cdhit -M 2500


robertsmac:cd-hit-v4.5.4-2011-03-07 sr320$ ./cd-hit-est -i /Volumes/Bay4\ scratch/temp/Haliotis_NCBI_combined.fasta -o /Volumes/Bay4\ scratch/temp/Haliotis_comboNCBI_cdhit -M 2500
================================================================
Program: CD-HIT, V4.5.4, Feb 23 2012, 11:03:06
Command: ./cd-hit-est -i
         /Volumes/Bay4 scratch/temp/Haliotis_NCBI_combined.fasta
         -o /Volumes/Bay4 scratch/temp/Haliotis_comboNCBI_cdhit
         -M 2500

Started: Fri Apr 20 15:14:24 2012
================================================================
                            Output                             
----------------------------------------------------------------
total seq: 53466
longest and shortest : 11166 and 28
Total letters: 24083836
Sequences have been sorted

Approximated minimal memory consumption:
Sequence        : 30M
Buffer          : 1 X 15M = 15M
Table           : 1 X 17M = 17M
Miscellaneous   : 4M
Total           : 68M

Table limit with the given memory limit:
Max number of representatives: 4194304
Max number of word counting entries: 303948411


Reduced it down to 
40521 sequences


----

18,444 are larger that 400bp

Haliotis_comboNCBI_cdhit selection.fa


Blasting on BFX………..

OUTPUT
Haliotis_NCBI_combo.txt